Task Loss Estimation for Sequence Prediction

نویسندگان

  • Dzmitry Bahdanau
  • Dmitriy Serdyuk
  • Philemon Brakel
  • Nan Rosemary Ke
  • Jan Chorowski
  • Aaron C. Courville
  • Yoshua Bengio
چکیده

Often, the performance on a supervised machine learning task is evaluated with a task loss function that cannot be optimized directly. Examples of such loss functions include the classification error, the edit distance and the BLEU score. A common workaround for this problem is to instead optimize a surrogate loss function, such as for instance cross-entropy or hinge loss. In order for this remedy to be effective, it is important to ensure that minimization of the surrogate loss results in minimization of the task loss, a condition that we call consistency with the task loss. In this work, we propose another method for deriving differentiable surrogate losses that provably meet this requirement. We focus on the broad class of models that define a score for every input-output pair. Our idea is that this score can be interpreted as an estimate of the task loss, and that the estimation error may be used as a consistent surrogate loss. A distinct feature of such an approach is that it defines the desirable value of the score for every input-output pair. We use this property to design specialized surrogate losses for Encoder-Decoder models often used for sequence prediction tasks. In our experiment, we benchmark on the task of speech recognition. Using a new surrogate loss instead of cross-entropy to train an Encoder-Decoder speech recognizer brings a significant 9% relative improvement in terms of Character Error Rate (CER) in the case when no extra corpora are used for language modeling.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Seismic Data Forecasting: A Sequence Prediction or a Sequence Recognition Task

In this paper, we have tried to predict earthquake events in a cluster of seismic data on pacific ring of fire, using multivariate adaptive regression splines (MARS). The model is employed as either a predictor for a sequence prediction task, or a binary classifier for a sequence recognition problem, which could alternatively help to predict an event. Here, we explain that sequence prediction/r...

متن کامل

Bayes, E-Bayes and Robust Bayes Premium Estimation and Prediction under the Squared Log Error Loss Function

In risk analysis based on Bayesian framework, premium calculation requires specification of a prior distribution for the risk parameter in the heterogeneous portfolio. When the prior knowledge is vague, the E-Bayesian and robust Bayesian analysis can be used to handle the uncertainty in specifying the prior distribution by considering a class of priors instead of a single prior. In th...

متن کامل

STATISTICAL PREDICTION OF THE SEQUENCE OF LARGE EARTHQUAKES IN IRAN

The use of different probability distributions as described by the Exponential, Pareto, Lognormal, Rayleigh, and Gama probability functions applied to estimation the time of the next great earthquake (Ms≥6.0) in different seismotectonic provinces of Iran. This prediction is based on the information about past earthquake occurrences in the given region and the basic assumption that future seismi...

متن کامل

A method of performance estimation for axial-flow turbines based on losses prediction

The main objective in this paper is creating a method for one-dimensional modeling of multi stage axial flow turbine. The calculation used in this technique is based on common thermodynamics and aerodynamics principles in a mean stream line analyses. In this approach, loss models have to be used to determine the entropy increase across each section in the turbine stage. Finally, the analysis an...

متن کامل

Tighter Bounds for Structured Estimation

Large-margin structured estimation methods work by minimizing a convex upper bound of loss functions. While they allow for efficient optimization algorithms, these convex formulations are not tight and sacrifice the ability to accurately model the true loss. We present tighter non-convex bounds based on generalizing the notion of a ramp loss from binary classification to structured estimation. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1511.06456  شماره 

صفحات  -

تاریخ انتشار 2015